Interconnection network for a Field Progammable Gate Array

ABSTRACT

An interconnection network architecture which provides an interconnection network which is especially useful for FPGAs is described. Based upon Benes networks, the resulting network interconnect is rearrangeable so that routing between logic cell terminals is guaranteed. Upper limits on time delays for the network interconnect are defined and pipelining for high speed operation is easily implemented. The described network interconnect offers flexibility so that many design options are presented to best suit the desired application.

CROSS-REFERENCES TO RELATED APPLICATIONS

[0001] This patent application claims priority from Provisional PatentApplication No. 60/223,047, filed Aug. 4, 2000 and is herebyincorporated by reference.

BACKGROUND OF THE INVENTION

[0002] The present invention relates to integrated circuitinterconnections and, in particular, to the interconnection architectureof FPGA (Field Programmable Gate Array) integrated circuits.

[0003] FPGAs are integrated circuits whose functionalities aredesignated by the users of the FPGA. The user programs the FPGA (hencethe term, “field programmable”) to perform the functions desired by theuser .

[0004] A very significant portion of an FPGA's design is the integratedcircuit's interconnection network between the logic cells or blocks,which perform the functions of the FPGA. Heretofore, the currentpractice for designing an FPGA interconnection architecture has beenempirical and on an ad hoc basis. The goal of the FPGA designer has beento create an interconnect structure which is sufficiently flexible toimplement the required wiring for any circuit design intended for theFPGA, and yet occupies a minimal amount of area of the integratedcircuit and with a minimal amount of transmission delay. In today's FPGAproducts, the interconnect network typically occupies about 90% of thechip area and the actual logic cells occupy only about 5% of the chip.In other words, most of the area of the integrated circuit is notdedicated to the circuits performing desired functions of the FPGA, butrather to the interconnections between those circuits.

[0005] Furthermore, the current practice for designing FPGAinterconnects is empirical and on an ad hoc basis. The users of theseFPGA products spend most of their design time trying to make theircircuits route to obtain the desired functions and to meet the timingconstraints. The rule of thumb is to only utilize 50% of the availablelogic cells in order to guarantee they can all be routed through theinterconnect network. If the timing constraints are relatively highspeed, then the rule of thumb is to only utilize 33% of the logic cellsin order to avoid the need for detours and longer delays in the routing.

[0006] Hence, there is a need for an FPGA interconnection networkarchitecture by which routing through the resulting interconnect networkis guaranteed and that the timing constraints of the interconnectnetwork are predictable. The present invention provides for such aninterconnection network.

SUMMARY OF THE INVENTION

[0007] The present invention provides for an integrated circuit having aplurality of logic cells; and a programmable network interconnecting thelogic cells. The programmable interconnection network has a plurality ofinterconnection network input terminals; a plurality of programmableswitches, each programmable switch having a plurality of input terminalsand output terminals with the programmable switch arranged so thatsignals on any input terminal are passed to any output terminal. Theplurality of programmable switches interconnecting the plurality ofinterconnection network input terminal to the interconnection networkoutput terminal are arranged in a Benes network so that connectionsbetween the interconnection network input terminals and interconnectionnetwork output terminals are rearrangeable.

[0008] The plurality of programmable switches are arranged inhierarchical levels with a first level of the programmable switcheshaving input terminals connected to the interconnection network inputterminals and a last level of the programmable switches having outputterminals connected to the interconnection network output terminals. Thelevels of the programmable switches intermediate the first and lastlevel are arranged in a plurality of first rank sub-interconnectionnetworks equal to the number of switch output terminals. Each first ranksub-interconnection network is connected to an output terminal of eachprogrammable switch in the first level and connected to an inputterminal of each programmable switch in the last level. In a similararrangement, the first rank sub-interconnection networks themselves areformed from second rank sub-interconnection networks and so forth.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009]FIG. 1 is a diagram of the interconnection architecture of acurrent SRAM-based FPGA;

[0010]FIG. 2A is a diagram of one operation of a 2×2 switch for a Benesnetwork; FIG. 2A is a diagram of another operation of a 2×2 switch for aBenes network; FIG. 2C is a block diagram of the elements of a 2×2switch;

[0011]FIG. 3A illustrates the organization of an 8×8 Benes network with2×2 switches in accordance with one embodiment of the present invention;FIG. 3B illustrates the interconnection between the switches at thefirst rank of hierarchy; FIG. 3C illustrates the interconnection betweenthe switches at the next lower rank of hierarchy; FIG. 3D shows thecomplete interconnection of the 8×8 Benes network; FIG. 3E is an exampleof a permutation of connections in the 8×8 Benes network to reverse theorder of input signals at the output terminals of the network;

[0012]FIG. 4A shows how the 8×8 Benes network is folded for an FPGAinterconnection network in accordance with an embodiment of the presentinvention; FIG. 4B shows the resulting folded Benes network; FIG. 4Cillustrates the FIG. 4B folded network in which the interconnectionshave been inverted by level;

[0013]FIG. 5 shows two exemplary logic cells connected to a combinedswitch of the FIG. 4C network in which the combined switch provides forcorner turn routing in accordance with an embodiment of the presentinvention;

[0014]FIG. 6A illustrates the four elementary states of the combinedswitch; FIG. 6B illustrates 10 additional states of an enhanced combinedswitch for corner turn routing in accordance with the present invention;

[0015]FIG. 7 is a block diagram of the enhanced combined switchdescribed with respect to FIGS. 6A and 6B;

[0016]FIG. 8 illustrates the four states with the input stage of thecombined switch having fanout capability;

[0017]FIG. 9 illustrates an exemplary arrangement to create fanoutfunctions with an FPGA interconnect network in accordance with anembodiment of the present invention;

[0018]FIG. 10 shows a pair of enhanced switches with timing latches inaccordance with an embodiment of the present invention;

[0019]FIG. 11A is a block diagram of an exemplary circuit pipeline withmismatched delay paths; FIG. 11B is a block diagram of a modified FIG.11A circuit pipeline with the delay paths corrected in accordance withan embodiment of the present invention;

[0020]FIG. 12 is a flow chart of a software generator for an FPGA, inaccordance with an embodiment of the present invention;

[0021]FIG. 13A illustrates an exemplary column-based floorplan layout ofan FPGA according to an embodiment of the present invention; FIG. 13Billustrates how two of the FIG. 13A columns are interconnected;

[0022]FIG. 14A shows the column-based layout of FIG. 13B with all thecells labeled; FIG. 14B illustrates the same topological networkarranged in a tree-based layout in accordance with the presentinvention; and FIG. 14C shows a modification of the FIG. 14B tree-basedlayout with wirelengths between switch levels minimized.

DESCRIPTION OF THE SPECIFIC EMBODIMENTS

[0023] Current SRAM (Static Random Access Memory)-based FPGA productsconform to the interconnect architecture as illustrated in FIG. 1: Thebasic structure of FIG. 1 has logic cells 10,which implement the desiredcircuit logic by the user, connection cells 11 which connect logic cells10 to the interconnect network, and switch cells 12 which implement theinterconnect network. Additional connections are made between a switchcell 12 and its four neighboring switch cells 12 in the north, east,west, and south directions. The switch cells 12, connection cells 11,and all their wires and connections constitute the interconnect networkof the FPGA. This basic unit is arrayed to build FPGAs of varying sizes.

[0024] The flexibility of this architecture lies within the connectioncell 11 and the switch cell 12. In common terminology, a fully“populated” connection cell 11 will connect each pin of the logic cell12 to every wire connecting to the switch cell 12. A “depopulated”connection cell 11 will connect each pin of the logic cell to a subsetof the wires connecting to the switch cell 12, with each pin connectingto a different, possibly overlapping, subset of wires. Similarly, afully “populated” switch cell will provide full crossbar connectionsbetween all the wires on all four of its sides, and a “depopulated”switch cell will only provide a subset of these connections. Lastly, theset of wires between any two cells is called a “channel”, and the numberof wires in a channel can be varied.

[0025] Each possible connection in the FPGA interconnect networkrequires its own pass gate and controlling configuration bit. A fullypopulated interconnect network is prohibitively expensive to implementand the current practice has been to build a parameterized softwaremodel that can represent varying depopulated interconnect networks. Thenvarious representative logic designs are tried onto the modelednetworks. Based on this empirical data, a judgment must be made aboutwhat constitutes an “acceptable” interconnect network in terms ofroutability versus implementation cost. This is an ad hoc process sincethere are no theoretical guarantees of routability, i.e., that thedesired interconnections can actually be made.

[0026] A further complication in the above empirical process has beenthat the demands on the interconnect network do not scale linearly withthe number of logic cells in the array. In other words, an interconnectnetwork that seems to route most designs on an array with 1 K logiccells cannot simply be replicated for a 64 K logic cell array. As seenempirically, the routing demands grow exponentially, but these demandsare highly dependent on the exact algorithms used to implement thedesign. Specifically, it depends on the algorithms used to map theoriginal circuit design onto the logic cells, to place the logic cellson the array, and to route (connect) the logic cells to each other.There is currently no precise theoretical model of this growth in wiringdemand, although current practice has been to approximate the wiringdemand with stochastic models. The use of these models entails someassumptions for certain coefficients, which are based on empirical data,and so current practice is still an ad hoc process.

[0027] In contrast, the present invention provides for an FPGAinterconnection network architecture which creates interconnectionnetworks which are “rearrangeable,” i.e., any permutation ofinterconnections from the network's input terminals to the outputterminals can be implemented. The resulting FPGA network interconnecthas guaranteed routing with defined maximum timing delays and isscalable.

[0028] The present invention uses the so-called Benes network, which hasbeen the subject of research in the telecommunications field,specifically for switching networks. Generally described, a Benesnetwork interconnects a number of network input terminals to a number ofnetwork output terminals. Between the input and output terminals areswitches, each switch itself having input terminals and a number ofoutput terminals and the ability to pass signals on any input terminalto any output terminal. The switches are connected in hierarchicallevels with a first level of switches having input terminals connectedto the network input terminals and a last level of the switches havingoutput terminals connected to the network output terminals. The levelsof the switches intermediate the first and last levels are arranged in aplurality of first rank sub-interconnection networks equal to the numberof switch input (and output) terminals, each first ranksub-interconnection network connected to an output terminal of eachswitch in the first level and connected to an input terminal of eachswitch in the last level. The first rank sub-interconnection networksare formed by second level switches having input terminals connected tothe output terminals of the first level switches and second-to-the-lastlevel switches having output terminals to the input terminals of thelast level of switches. The levels of switches intermediate the secondand second-to-the-last level are arranged in a plurality of second ranksub-interconnection networks equal to the number of switch outputterminals with each second rank sub-interconnection network connected toan output terminal of each second level switch and connected to an inputterminal of each second-to-the-last level switch.

[0029] A switch level hierarchy is formed because each ranksub-interconnection network is formed like the rank sub-interconnectionnetwork above. That is, each rank sub-interconnection network is formedby a plurality of switches in one level, the switches having inputterminals connected to output terminals of switches of asub-interconnect network rank immediately higher; and a correspondinglevel of switches having output terminals connected to input terminalsof switches of the sub-interconnect network ran immediately higher; andthe levels of the switches intermediate the switches in the one andcorresponding levels arranged in a plurality of lower ranksub-interconnection networks equal to the number of switch outputterminals, each lower rank sub-interconnection network connected to anoutput terminal of each switch in the one level and connected to aninput terminal of each switch in the corresponding level. to define thehierarchical level arrangement of the switches.

[0030] The particular Benes network described immediately below explainsthe switch hierarchy with specificity. This network is also useful toimplement an FPGA according to the present invention.

[0031] Benes Network with 2×2 Switches

[0032] The building block of the described Benes network is the 2×2 (2input, 2 output) switch 20, having operations illustrated in FIGS. 2Aand 2B. The 2×2 Benes switch 20 has two possible configuration modes:pass and cross. In pass mode illustrated by FIG. 2A, a signal on input Ais passed straight to output C, and a signal on input B is passedstraight to output D. In cross mode illustrated by FIG. 2B, a signal oninput A crosses over to output D and a signal on input B crosses over tooutput C. A single configuration or control bit can control these twomodes.

[0033] The switching itself can be implemented with two 2:1 multiplexersor MUX's as shown by FIG. 2C. The switch 20 has two MUXs 21 and 22having two input nodes which are each connected to one of the inputterminals, input A or input B, of the switch 20. The output node of theMUX 21 forms the output terminal, output A, and the output node of theMUX 22 forms the output terminal, output B, of the switch 20. Both MUXs21 and 22 are connected to a control line 23 which carries theconfiguration or control bit. The entire switch cell only requires 18transistors in a CMOS (Complementary Metal-Oxide-Semiconductor)implementation of an integrated circuit.

[0034] These 2×2 switches are connected in a specific topology to builda Benes network. For the purpose of illustration, the arrangement of the2×2 switches 20 in an 8×8 Benes network is shown in FIG. 3A. For anetwork with N inputs and N outputs, N being a power of 2, there are(2 * (log₂N)−1) levels of switches, each level consisting of N/2switches. In this example of an 8×8 network, each level has 4 switchcells and there are 5 levels. The interconnection between the switches20 can best be understood by viewing the network in a hierarchicalarrangement, starting from the outside and proceeding inwards. We canview the two outermost levels, levels 1 and 5, in detail and view theinner levels as hierarchical blocks, as illustrated by FIG. 3B.. Theinner levels can be viewed as two hierarchical blocks, an Upper Network25 and a Lower Network 26. In level 1, each switch cell 20 has oneoutput going to the Upper Network 25, and one output going to the LowerNetwork 26. Similarly in level 5, each switch cell has one input fromthe Upper Network 25, and one input coming from the Lower Network 26.

[0035] At the next level of the hierarchy of the Benes network, thedetails of the Upper Network 25 and the Lower Network 26 are expanded inFIG. 3C. The Upper Network 25 is formed by switches 20 in levels 2 and4, and Upper and Lower Networks 27 and 28 respectively. The LowerNetwork 26 is formed by switches 20 in levels 2 and 4, and its own Upperand Lower Networks 29 and 30 respectively. Each of these networks 27-30are half the size of the higher level networks 25 and 26 and aresimilarly decomposed into their own Upper and Lower Networks: In thisexample of an 8×8 network, the bottom of the hierarchy has been reachedsince the lower level networks 27-30 are switches 20 in level 3. Forlarger networks, similar decomposition into the Upper and Lower Networksmay be performed until the bottom of the hierarchy is reached. Thecomplete interconnection of the constituent switches 20 in the 8×8 Benesnetwork is illustrated by FIG. 3D.

[0036] The Benes network of FIG. 3D is not configured with only thehard-wired connections between the switch cells 20 illustrated. Thisnetwork can potentially implement any permutation of signals on inputterminals to output terminals. In order to configure the network toimplement a specific routing, each switch cell 20 must be individuallyconfigured as either “pass” or “cross” mode described previously. Theexample of FIG. 3E shows the configuration of the network to implementan order reversal from the inputs to the outputs:

[0037] Note that there are many variations of the Benes network. Thehierarchical sub-division into Upper and Lower networks can begeneralized to more than 2 sub-networks, so networks of the sizep^(n),p>2, can be constructed. Also, the sub-division does not requirethat the sub-networks be of equal size. This generalized constructionleads to overall Benes networks with arbitrary numbers of inputs and aproportional number of switch cells. All variants are simply be referredto as Benes networks.

[0038] The Benes network is a very powerful and efficientinterconnection network with guaranteed routability. Its use has notbeen more widespread because of the complexity of the algorithm requiredto determine the appropriate configuration of the switches for aspecific routing. The Benes network is “rearrangeable,” but not“non-blocking.” Non-blocking means that any input-to-output connectioncan always be made, even if there are already existing connections onthe network. Rearrangeable is less powerful and means that anyinput-to-output connection can be made, but some existing connectionsmay need to be rerouted. In the dynamic worlds of telephone switchingand data communication networks, a Benes network would require that arouting algorithm be performed every time a new connection is requested.A Benes routing algorithm requires time O(Nlog₂N), but the networkitself transmits data in time O(log₂N). It takes longer to reconfigurethe network than to actually transmit the data through the network.Hence, current practice in the data communications has to use moreexpensive non-blocking switches.

[0039] However, the present invention recognizes that in the FPGA world,routing is not so dynamic. There is no real time set up and tear down offleeting connections. Instead, in an offline process, a circuit designis mapped onto the FPGA integrated circuit once and the resultinginterconnect configuration is used without change. Even in theapplication of FPGA technology to the burgeoning field of“reconfigurable logic”, multiple configurations may be rapidly swappedin and out of the FPGA, but each configuration itself is never changed.Presently, the offline routing process in an FPGA requires on the orderof minutes or even hours of execution time. In contrast, the executionof a Benes routing algorithm requires in the order of 10 seconds (whichis completely unacceptable in a data communications network) inaccordance with the present invention. This time is spectacularly fastand routability is guaranteed.

[0040] Specific Implementation of Benes Network in FPGAs

[0041] There are a number of ways that the Benes network may be adaptedto make it more efficient as an interconnection network for an FPGA orMPGA (Mask Programmable Gate Array). In an FPGA, the logic is composedfrom building blocks called “logic cells”, and the logic cells containboth input and output pins. An example of a typical logic cell is a2-input NAND gate. So an FPGA interconnection network should haveneighboring “leaf cells” which correspond to the logic cells and whichcontain both inputs and outputs to make the connections to the logiccells. This can be accommodated by “folding in half” the original Benesnetwork, and combining the switch cells 20 from the first level and lastlevel, the second level and second-to-last level, and so on. This isillustrated in FIG. 4A with the 8×8 network example. The “folding” ismade along the dotted line 31 which runs through the switches 20 inlevel 3. The switches 20 are labeled by location in the network tomaintain identification through the folding operation. The first numberin the labels identifies the level of the switch and the second numberits row location. Hence switch with label, “4.3,” is in level 4 and row3.

[0042] The resulting folded network is illustrated by FIG. 4B. Theswitches 20 are combined into two, with the formerly level 3 switchesduplicated for uniformity. While the combined switches 32 represent atopological change of the 8×8 Benes network, it should be noted that theconnections between the cells 20 remain the same. The combined input andoutput switch cells 32 on the left of the folded network, e.g., combinedswitches 1.2 and 5.2, form the leaf cells for the connection to the pinsof the FPGA logic cells.

[0043] From the connections between the combined switches 32, thenetwork of FIG. 4B can be turned “inside out”, that is, the innermostlevels of the combined switches 32 become the outermost and vice versa,as illustrated in FIG. 4C, without affecting the routability of theinterconnection network. The levels with shorter connections are movedbe closer to the logic cells. This is more suitable for an FPGA.

[0044] Comer Turning for Interconnection Network

[0045] With inputs and outputs combined into a single switch cell 32,shorter routes between logic cells which don't travel through all2*(log₂N) levels of switches can be configured. In the original Benesnetwork, every route must travel through all the levels to go from inputto output. In the adapted interconnection network, signals from thelogic can “turn the corner” before reaching the opposite side of thenetwork. For example, in FIG. 5, logic cell 41 has an output pin thatmust be routed to an input pin on logic cell 42.

[0046] Of course, the particular advantage of corner turning in theinterconnection network depends on the quality of logic cell placementalgorithm for the FPGA. (Note that “placement” for an FPGA logic cell isnot the physical placement of selected logic gates to form a desiredfunction, but rather the programming of a selected logic cell to performthe desired function.) The algorithm is designed to minimize thedistance between connected logic cells, where distance is not defined asit usually is for an FPGA or MPGA. The usual definition of distance in aplacement algorithm is either Euclidean or Manhattan. In presentinterconnection network, distance is defined as the depth of the firstcommon ancestor in the network because a corner can be turned at thispoint. The most appropriate placement algorithms build cluster treeswith capacity constraints, either top-down or bottom-up. Nonetheless,regardless of the quality of logic cell placement, the present inventionstill provides the original worst case bound of 2^(*)(log₂N) switches,no matter how highly the network is utilized. In contrast, current FPGAproducts cannot guarantee a worst case bound on signal delay when theintegrated circuit is highly utilized.

[0047] Enhanced Switch for FPGAs

[0048] Corner turning requires that the original Benes switch beenhanced. It should noted that the original switch had 2 statesresponsive to 1 configuration bit. See the description above withrespect to FIGS. 2A and 2B. Just by combining the input and outputswitches cells, the combined switch 32 has 4 states and requires 2configuration bits. FIG. 6A illustrates the four permutations of passingsignals from the input terminals to the output terminals for thecombined switch 32.

[0049] The comer turning feature adds 5 more states for the “output”lower half of the combined switch 32. When multiplied by 2 states forthe “input” upper half, there are a total of 10 new states for thecombined switch 32. These additional 10 states are illustrated by FIG.6B. It should be noted that for the corner turning states shown, thereare only two possible paths to turn a corner: Each of the two possiblecorner turning outputs can only be connected to one of the inputs, notboth inputs. The unconnected input comes from the same switch as the onethe output is going to. While there may be some possible use for thisconnection in terms of selectively adding variable delays to certainroutes, the cost of implementing additional configuration bits to allcombined switch cells to support these paths is unjustified.

[0050] Of course, the increased number of states for the combined switchcan not be satisfied by the two-MUX structure of FIG. 2C. FIG. 7 is ablock diagram of the combined switch cell 32, which is formed by MUXs61-66 which operate by the setting of configuration bits on five controlline nodes 71-75. MUXs 61 and 62 each have terminal nodes connected toinputs A and B; control line node 71 is connected to MUX 61 and controlline node 72 is connected to MUX 72. The output nodes of MUX 61 and 62form the outputs of the combined switch 32. MUXs 63 and 65 are connectedso that the output node of the MUX 63 forms one input node of the MUX65; in effect, the MUXs 63 and 65 form a 3:1 MUX. The input nodes of theMUX 63 are connected to the reverse direction inputs C and D, and thesecond input node of the MUX 65 is connected to the input B. The MUXs 64and 66 form a second 3:1 MUX. The output node of the MUX 64 forms oneinput node of the MUX 66. The input nodes of the MUX 64 are alsoconnected to the reverse direction inputs C and D, and the second inputnode of the MUX 66 is connected to the input A. The output nodes of MUX65 and 66 form the reverse direction outputs of the combined switch 32.The control line node 73 is connected in common to the MUXs 63 and 64.The control line node 74 is connected to the MUX 65 and the control line75 is connected to the MUX 66.

[0051] One further enhancement is required for a interconnection networkwhich is highly suitable for FPGAs. That enhancement is fanout support.In an FPGA, the outputs frequently fan out to multiple inputs. At theswitch level, fanout can be supported in either the “output” half or the“input” half of the combined switch 32. However, in terms of routabilityfor one-to-many connections, the fanout must be in the input half of thecombined switch in order to break cyclic dependencies. Therefore, thepreferred embodiment of the combined switch cell 32 has 4 states in theinput half of the switch, as represented by the four states in FIG. 8.

[0052] An alternative way of creating the fanout function is with theuse of logic cells which are connected through MUXs to the interconnectnetwork. Such an arrangement avoids the placing of additionalfunctionality upon the interconnect network itself. An example of thisarrangement is shown in FIG. 9. Each logic cell is a 4-LUT (4 input LookUp Table). There are four 4-LUTs 76-79 (having a total of 16 inputs)with 4 outputs A-D respectively. These outputs A-D are connected to theinput nodes of each of 16 MUXs 80 which have a total of 16 outputs.These outputs (as the inputs to the 4-LUTs 76-79), in turn, areconnected to the enhanced combined switch cells of the first (and last)levels of the described Benes interconnect network. Through controlsignals on the MUXs, the outputs A-D can be selectively placed into theinterconnect network. With the repetition of one of the 4LUT outputs A-Dinto the interconnect network, a fanout is effectively created.

[0053] Hence, with the 7 possible states on the output half of thecombined switch, the enhanced switch has a total of 28 states. A switchcell appropriate for an FPGA interconnect network has been created froma simple 2-state switch cell which requires 1 configuration bit andcapable of being implemented with 18 transistors in CMOS. The 28-statecombined switch cell requires 5 configuration bits and can beimplemented with 74 transistors in CMOS. The most expensive enhancement,in terms of silicon area, is the corner turning feature. Without cornerturning, the combined switch cell would only have 8 states, whichrequire 3 configuration bits and can be implemented in 46 transistors inCMOS. This is about a 38% reduction in silicon area for the interconnectnetwork alone. For the purpose of analysis, assuming the logic cell is a4-LUT (4-input Look-Up Table) with a latched output and the array isbuilt with 16 K logic cells (a 64 K gate equivalent), a 33% reduction inthe total FPGA area may be achieved. Table A below compares the resultsof a combined switch cell with and without corning turning: TABLE ASwitch Cell Logic Cell Area Per FPGA FPGA Area Transistors TransistorsTransistor Total Area Percentage With Corner Turning 74 270 0.000004 153100% Without Corner Turning 46 270 0.000004 102  67%

[0054] As discussed previously, corner turning is highly desirable forreducing the signal delay due to routing. The FPGA user should be ableto make the design tradeoff whether a specific project needs a fasterchip or a smaller chip. The interconnect network according to thepresent invention provides many options to the FPGA user. An even higherspeed, larger area alternative is discussed below.

[0055] Pipelined Interconnect and Predictable Delays through theInterconnect Network

[0056] Even with the enhanced combined switch, the present inventionprovides for further improvement. The biggest problem in semi-customVLSI (Very Large Scale Integration) design today is the signal delaysdue to an integrated circuit's interconnection network. With deepsub-micron fabrication processes and its thin resistive wires, the delaydue to interconnect dominates any delay due to the logic cells. Thisproblem is even worse in an FPGA because there is the additionalinterconnect delay due to the switch cells in the routing. Thedifficulty arises in trying to either predict or constrain the routingdelays.

[0057] The current practice for VLSI design is to estimate the delay dueto routing during the logic design stage. The estimation is done bystatistical wire loading models or by rules-of-thumb which limit thelevels of logic between clock cycles. Then the actual placement androuting of the logic is performed, and the prior estimates are usuallypassed on as constraints to these algorithms. However, constraint drivenplace-and-route algorithms are still an open problem, and so there stillmust be a timing verification stage after these programs are run.Usually there are timing violations and the designer has differentoptions. He or she can try to tweak the placement and routing to meetthe timing constraints. This is largely a matter of patience and luck.If this fails, he can go back and modify the logic design based on theactual timing from the place-and-route. Then he tries to place and routeroutine again with the hope that the process will converge and therewill be no new timing violations. However, placement algorithms havebeen theoretically proven to be highly sensitive to even small changes,so the delay profile of the modified logic design may be very differentfrom what was hoped. This is usually a highly iterative and lengthyprocess. In practice, VLSI designers have learned not to be aggressivewith their timing estimates and constraints, so that the process willconverge more rapidly. Most current FPGA designs only run at 50-100 MHz.

[0058] With an interconnect network according to the present invention,a totally different design methodology is possible. All the problemsarising from variable interconnect delays and the need to predict andconstrain them are avoided. The described interconnect network providesfor a uniform multi-stage network as illustrated in FIG. 10. Arepresentative pair of neighboring enhanced switches 48 and 49 areshown. The inputs of the switch cells 48 and 49 have latches 50 topipeline the signals routed through the network. The latches 50 in theswitch cell 48 are responsive to one edge of a clock signal and thelatches 50 in the switch cell 49 are responsive to the other edge of theclock signal. Alternating switch levels in the interconnect network canthus be rising or falling edge-triggered of a clock signal. In a fullypipelined design, the clock can operate as fast as the slowest stage inthe network, and should be capable of supporting clock rates up to 1000MHz.

[0059] To maximize throughput, every switch level of the interconnectnetwork may be latched. On the other hand, if such a high clock rate isnot needed, every few levels may be latched for a lower clock rate andthroughput. Each latch requires 10 transistors to implement, so thateach unlatched switch cell is 46% smaller than its latched version.Alternatively, latches may be included in every level, but with a 2:1MUX, one input being the output of the latch and the other being theinput to the latch. The MUX serves as a field programmable bypass to thelatch, and allows field control of the number of switch levels betweenthe latches. In this manner, the number of switch levels between latchesand whether they include the bypass MUXs is placed under the control ofthe FPGA user.

[0060] For a fully pipelined design, a logic cell's input signals mustarrive at the same time. The described interconnect network can disablecorner turning (either in the routing algorithm or in the FPGA networkgenerator) so that every route passes through exactly 2*(log₂N) levelsand the delay is known a priori. Then the only source of signal delayvariation arises from the differing levels of logic along the paths fordifferent input pins for a given logic cell in the user's logic design.

[0061]FIG. 11A represents such an example. Two sets of data arepresented in queues, one set for logic cell 51 and the other set forlogic cell 52. The corresponding data from both sets are to be processedby logic cell 54. With the assumption that each route has a delay of 1ns and each logic cell 51 and 52 has a delay of 1 ns, then the upperpath (with logic cell 51) has a delay of 2ns and the lower path (withlogic cells 52 and 53 and a path between the two logic cells) has adelay of 4 ns. This design will not operate correctly in a fullypipelined mode. By the time Data 1 arrives at logic cell 54 along thebottom path, Data 1 has already passed logic cell 54 along the upperpath. Instead Data 2 is present.

[0062] But with a minor modification, the present invention allows thedesign to be pipelined and to operate the clock at 1 ns. FIG. 11B showsthe insertion of a buffer 55 along the upper path between the logiccells 51 and 54. Buffers can be inserted without affecting the logic ofthe design. In fact, most commercial place-and-route tools will insertbuffers on signals with long routing lines in order to improve theiroverall delay; this is done transparently for the user. The presentinvention allows buffer insertion to be made simply because the signaldelays through the interconnect network are known.

[0063] Hence the present invention offers a methodology for fullypipelined design as follows: analyze a given netlist to identifyexisting mismatches in delay paths; optionally, the user may modify hisor her netlist to eliminate the mismatches; for each mismatch, insertbuffers to lengthen the shorter paths until the delay paths match;determine what size array is required for the modified netlist; andperform place-and-route without corner turning. This methodology doesnot require iteration as current methodologies do. This is because oftwo properties of interconnection network of the present invention: 1)the delay of every routing path is known a priori; and 100% routabilityfor a given array is guaranteed, a property of the Benes network.

[0064] The described methodology supports fully pipelined operation atvery high clock rates. It should be noted that pipelining yields asignal processing throughput proportional to the clock rate, but thesignal processing latency is still proportional to the levels of logicand interconnect.

[0065] Latency Control

[0066] The present invention permits even further efforts to reducelatency. One potential drawback of using a multi-stage network as aninterconnect network is the potentially long latency of a route.Although comer turning reduces the average length of the routing, theworst case length is still 2^(*)(log₂N) levels, as explained above.While the performance of an FPGA with the described interconnectionarchitecture is superior than existing FPGA products, there is room tocontrol worst case latency. This can be done without giving upguaranteed routability, known delays, or pipeline support, but at theexpense of more silicon area.

[0067] Because of the hierarchical structure of a Benes network, theBenes network can be recursively constructed. The Upper Network andLower Network are themselves expanded into Benes networks, each withhalf the number of inputs and outputs of the original network. See FIG.3B. In essence, each of these sub-networks simply guarantees a means ofrouting any of its inputs to any of its outputs. Functionally, this is acrossbar switch. A Benes network is a much more area efficient method ofimplementing a rearrangeable crossbar. The size of a Benes network growsby N^(*)log₂N, whereas the size of a crossbar grows by N². However,maximum latency can be reduced if just the lowest levels of Benesnetworks are substituted with crossbars. The following Table Billustrates the relative areas, in 0.18-micron technology, for a 16 Klogic cell array: TABLE B Inputs to Sub-Network Levels Reduced BenesTotal mm2 Crossbar Total mm2 16 7 9.7 11.0 32 9 12.1 24.2 64 11 14.554.3 128 13 17.0 122.0 256 15 19.4 273.7

[0068] Thus the replacement of the 16×16, 32×32, or even the 64×64sub-networks in the described interconnect networks are viable andattractive options. Nonetheless, the option which should be selecteddepends on the constraints of the specific application.

[0069] Parameterized Array Generation

[0070] In accordance with the present invention, the describedinterconnect network has several options which trade off area, latency,and throughput. To take advantage of this flexibility, differentfamilies of FPGA products, each family optimized for different designobjectives, may be created. Perhaps a better way is to provide an FPGAarray generator program to the end user. Such a generator-basedmethodology allow the user to explore various tradeoffs for his or herspecific application. In addition, the generator allows the end user tospecify the size and shape of the array desired. This enables the userto fit an FPGA component onto a larger VLSI chip floorplan with othercomponents, a further advantage of the present invention.

[0071] A summary of the features of the interconnect network optionsthat have been described so far is listed in Table C: TABLE C ObjectiveCorner Turning Crossbars Latches Typical Yes No No Minimum Area No No NoMinimum Latency Yes Yes No Maximum Throughput No Yes Yes

[0072] Each of these options is a control on the software generatorhaving a top level flow chart as shown in FIG. 12, in accordance withthe present invention. The Corner Turning control either includes orexcludes the corner turning MUX's and their configuration bits for allswitch cells. The Crossbar control offers a set of choices (e.g., 4, 8,16, or 32) for the level of sub-network to be replaced with crossbars.The Latch control offers a set of choices (e.g., 0, 1, 2, or 3) for thenumber of levels of unlatched switches between latched switches. Inaddition, the generator offers a “novice user” mode for users who arenot familiar with the details of the described interconnect network. Inthe novice user mode, there are only three choices: Minimum Area,Minimum Latency, or Maximum Throughput. Selection one of these choicesinstructs the generator to set the Corner Turning, Crossbars, andLatches options with appropriate defaults.

[0073] Besides the above options for the generation of the interconnectnetwork, the generator also accepts parameters for the other componentsof the FPGA array. The user can specify the total number of primary IO's(Input/Output terminals) for the array. Optionally, the number of IO'sper side (north, east, west, or south) can be specified. If the numberof IO's per side is not specified, the total number of IO's are evenlydistributed around the array. In addition, if the sides have beenspecified, a list of the exact offset location for each IO mayoptionally be specified. The generator performs all the necessary designrule checks. For the logic cells, the user can specify the total numberof logic cells desired and the generator then rounds up to the nearestpower of two. After the number of logic cells is specified, thegenerator offers a choice of feasible layouts with their various widthand height dimensions. Optionally, the user may specify either a maximumwidth or a maximum height, and the generator automatically selects thelayout which most closely conforms to this constraint.

[0074] Lastly, the Generator can be incorporated into a broaderautomated methodology which includes the user's design logic synthesis.From the output of the logic synthesis program, the methodologyautomatically determines the number of primary IO's and logic cells, andthen invokes the generator with these parameters. An optional “fudge”factor can be specified by the user (e.g. 10%) to instruct the generatorto create an array with the specified number of additional logic cellsover the number required by the synthesized logic.

[0075] Because the array is field programmable, the user may wish tospecify more logic cells than are absolutely required by the givendesign. These extra cells can be used in the field to accommodate futurebug fixes and enhancements. It is even possible to accommodate a userwho has only a general idea of the logic design, but can specify themaximum gate count anticipated, and does so in order to begin themanufacturing of his or her ASIC (Application Specific IntegratedCircuit) before the final logic design is finished. It is also possibleto accommodate the user who wants a single array to be able to acceptmore than one design. For example, the user may want his product to beable to interface many alternative external memory devices, eachrequiring different protocols or timing, and the final selection ofinterface is field configurable.

[0076] Layout for FPGA Array

[0077] There are two viable floorplans to map the topology of presentinvention's interconnect network onto a physical layout, i.e., on thesurface of the substrate of an integrated circuit. The first floorplanis tree-based, and the second floorplan is column-based. Each floorplanhas its own advantages and disadvantages.

[0078] The most straightforward mapping is column-based. With theprevious illustrations of the Benes interconnect network and columns oflogic cells added, the layout is nearly completed. See FIG. 13A. Thereare two logic cells 81 per switch cell 82, connected in the so called“butterfly” pattern, consistent with the Benes network topology: Moregenerally, an input-output pin-pair of a logic cell forms the leaves ofthe Benes interconnect network. So if a logic cell has many pins, thereare a number of switch cells connected to it. For example, if the logiccell is a 4-input, 1-output Look Up Table (see FIG. 9), the singleoutput pin is fanned out 4 times to form 4 pin-pairs for each cell, andthere are 2 switch cells in the first level of the Benes networkconnecting to each logic cell.

[0079] For multiple column arrays, levels of switch cells are added toeach column's sub-network. The new levels are connected together betweencolumns in a topology consistent with a Benes network. See FIG. 13B inwhich a level of switch cells 83 are added to make the connectionbetween two FIG. 13A column arrays. For each additional doubling of thenumber of cell columns, an additional column of switch cells must beadded to each of the cell columns, and the span for connecting these newcolumns to each other doubles as well. As before, each column'stop-level inputs and outputs are connected to the primary I/O of theFPGA.

[0080] The strength of this column floorplan is that the number of cellsin a column can be any power of 2, and the number of rows can alsoindependently be any power of 2. This enables the generation of arrayscontaining numbers of cells that are any power of 2, and with aselection of various aspect ratios. On the other hand, the weakness ofthis column floorplan is that the long inter-column connections forseveral levels can all pass in parallel over the same area. Thefloorplan may be limited by the metal pitch constraints of thesemiconductor process used to manufacture the FPGA; and the floorplanmay also have crosstalk problems. These issues must be addressedcarefully in the leaf cell design for the software generator.

[0081] The other viable floorplan maps the Benes topology onto ahierarchical tree layout. It is most clearly understood by showing thecell-to-cell correspondence with the column floorplan. FIG. 14A shows a16 logic cell column floorplan (the same as shown in FIG. 13B) with thecells labeled for identification; FIG. 14B shows the equivalent networklaid out in a hierarchical tree floorplan. The advantage of this treefloorplan is that the maximum wire length for any connection is only aquarter of the width of the substrate surface; whereas in the columnfloorplan, the maximum wire length is a half the width of the chip.Furthermore, in the tree floorplan the switches can be “slid” alongtheir hierarchical connection paths in order to evenly distribute thewire lengths between levels and to thus minimize the longest wirelength. FIG. 14B is used as an example of sliding the switches towardsthe center, where the maximum wire length is now 2, instead of theoriginal 3. The resulting layout is illustrated in FIG. 14C.

[0082] This rearrangement, in turn, minimizes the size of the circuitdrivers in the switch cells. This can be significant if the same switchcell is used everywhere in the generator. Minimizing the longest wire isalso significant in pipeline operation because the clock rate is limitedby the slowest level in the network. On the other hand, the disadvantageof this tree floorplan is that it does not pack the substrate surfaceperfectly and leaves some open spaces. Additionally, the aspect ratio ofthe array is fixed.

[0083] All these various floorplans still implement the same topology ofthe disclosed interconnect network. In fact, a straightforward softwaremethod can mechanically transform between the various floorplans, evenafter place-and-route has been performed. Other than the physicallocations of the cells, the only remaining question is the delay of thephysical interconnect wires. This can be approximated with simpleresistance and capacitance models since interconnect wires have nobranches. These simple models cannot account for the crosstalk andinterlayer parasitics, but they should be sufficient for all the designstages before full-chip verification.

[0084] Applicability to MPGA

[0085] Finally, the disclosed Benes interconnect network can also beapplied to MPGAs (Mask Programmable Gate Arrays). This is accomplishedas a post-processing step where each switch cell used in the routing isreplaced with either a metal via or an end-to-end concatenation of twosame layer metal wires, depending on the orientation of the wires. Theadvantage over existing MPGA interconnect architectures is theguaranteed routability, support for pipelining, as well as the fastexecution speed of the place-and-route algorithms.

[0086] While the foregoing is a complete description of the embodimentsof the invention, it should be evident that various modifications,alternatives and equivalents may be made and used. For example, whilethe foregoing description is that of an FPGA integrated circuit, thepresent invention works equally well in an FPGA which forms only aportion of an integrated circuit. Furthermore, while logic cells areinterconnected in an FPGA, the interconnection network of the presentinvention may be used to interconnect arbitrary components, such asmultiple processors or peripheral blocks, of an integrated circuit. Infact, the interconnection network might be even on a separate integratedcircuit and is used to interconnect separate integrated circuit devices.Accordingly, the above description should not be taken as limiting thescope of the invention which is defined by the metes and bounds of theappended claims.

What is claimed is:
 1. An integrated circuit comprising a plurality oflogic cells; and a programmable network interconnecting said logiccells, said programmable interconnection network having a plurality ofinterconnection network input terminals; a plurality of interconnectionnetwork output terminals to said interconnection network inputterminals; a plurality of programmable switches, each programmableswitch having a plurality of input terminals and a number of outputterminals, said programmable switch arranged so that signals on anyinput terminal are passed to any output terminal; said plurality ofprogrammable switches interconnecting said plurality of interconnectionnetwork input terminal to said interconnection network output terminal,said plurality of programmable switches arranged in a Benes network sothat connections between said interconnection network input terminalsand interconnection network output terminals are rearrangeable.
 2. Theintegrated circuit of claim 1 wherein the number of input terminals foreach of said programmable switches is two.
 3. The integrated circuit ofclaim 2 wherein each programmable switch comprises first and secondmultiplexers, each multiplexer having two input nodes and an outputnode, each input node connected to one of said input terminals and eachoutput node connected to one of said output terminals, said first andsecond multiplexers responsive to a control signal so that when saidfirst multiplexer passes a signal on said first input terminal to anoutput terminal connected to said output node of said first multiplexer,said second multiplexer passes a signal on said second input terminal toan output terminal connected to said output node of said secondmultiplexer, and when said first multiplexer passes a signal on saidsecond input terminal to said output terminal connected to said outputnode of said first multiplexer, said second multiplexer passes a signalon said first input terminal to said output terminal connected to saidoutput node of said second multiplexer.
 4. The integrated circuit ofclaim 1 wherein said plurality of programmable switches are arranged inhierarchical levels, a first level of said programmable switches havinginput terminals connected to said interconnection network inputterminals and a last level of said programmable switches having outputterminals connected to said interconnection network output terminals,said levels of said programmable switches intermediate said first andlast level arranged in a plurality of first rank sub-interconnectionnetworks equal to the number of switch output terminals, each first ranksub-interconnection network connected to an output terminal of eachprogrammable switch in said first level and connected to an inputterminal of each programmable switch in said last level.
 5. Theintegrated circuit of claim 4 wherein said first ranksub-interconnection networks each comprise a second level ofprogrammable switches having input terminals connected to said outputterminals of said first level programmable switches and a second-tothe-last level of said programmable switches having output terminalsconnected to said input terminals of said last level programmableswitches; and said levels of said s programmable witches intermediatesaid second and second-to-the-last level arranged in a plurality ofsecond rank sub-interconnection networks equal to the number of switchoutput terminals, each second rank sub-interconnection network connectedto an output terminal of each programmable switch in said second leveland connected to an input terminal of each programmable switch in saidsecond-to-the last level.
 6. The integrated circuit of claim 4 whereineach rank sub-interconnection network comprises a plurality ofprogrammable switches of one level, said programmable switches havinginput terminals connected to output terminals of programmable switchesof a sub-interconnection network of a rank immediately higher; pluralityof programmable switches of a level corresponding to said one level;said programmable switches having output terminals connected to inputterminals of programmable switches of said sub-interconnection networkof said rank immediately higher; and said levels of said programmableswitches intermediate said switches of one and corresponding levelsarranged in a plurality of sub-interconnection networks of rankimmediately lower, said sub-interconnection networks equal to the numberof switch output terminals, each sub-interconnection network connectedto an output terminal of each programmable switch in said one level andconnected to an input terminal of each programmable switch in saidcorresponding level. to define said hierarchical level arrangement ofprogrammable switches.
 7. The integrated circuit of claim 6 wherein thenumber of input terminals for each of said programmable switches is two.8. The integrated circuit of claim 1 wherein said integrated circuitcomprises an FPGA.